Outliers Detection in Statistical Quality Control Using Support Vector Data Description

نویسندگان

  • Birajashis Pattnaik
  • S. N. Murty
چکیده

Outliers have been informally defined as observations in a data set which appear to be inconsistent with the remainder of that set of data, or which deviate so much from other observations so as to arouse suspicions that they were generated by a different mechanism [1]. The identification of outliers can lead to the discovery of useful knowledge and has a number of practical applications. The boundary of a dataset can be used to detect novel data or outliers. Outlier detection belongs to the most important tasks in data analysis. The outliers describe the abnormal data behavior, i.e. data that are deviating from the natural data variability Outlier detection has many applications, such as data cleaning and fraud detection. Frequently, outliers are removed to improve accuracy of the estimators. But sometimes the presence of an outlier has a certain meaning which explanation can be lost if the outlier is deleted. Often outliers are of primary interest, for example in geochemical exploration they are indications for mineral deposits. The cut-off value or threshold, which divides anomalous and non-anomalous data numerically, is often the basis for important decisions. It is tried here to find the best representation of a dataset such that the target class may best be distinguish from the real data. In this paper outlier in a dataset of percentage of different indigrends for copper production is studied using support vector data analysis for its quality study.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...

متن کامل

Validating An Online Adaptive System Using Support Vector Machines

In this paper, we propose a novelty detection method based on Support Vector Machines as a candidate approach for validating online adaptive systems. As a oneclass classifier, the support vector data description is able to form a decision boundary around the learned data domain with very little or no knowledge of data points outside the boundary (outliers). Preliminary studies on an actual onli...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

SVM-OD: SVM Method to Detect Outliers

Outlier detection is an important task in data mining because outliers can be either useful knowledge or noise. Many statistical methods have been applied to detect outliers, but they usually assume a given distribution of data and it is difficult to deal with high dimensional data. The Statistical Learning Theory (SLT) established by Vapnik et aI. provides a new way to overcome these drawbacks...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005